57 research outputs found

    Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams.

    No full text
    International audienceMulti-agent systems (MAS) are a field of study of growing interest in a variety of domains such as robotics or distributed controls. The article focuses on decentralized reinforcement learning (RL) in cooperative MAS, where a team of independent learning robot (IL) try to coordinate their individual behavior to reach a coherent joint behavior. We assume that each robot has no information about its teammates'actions. To date, RL approaches for such ILs did not guarantee convergence to the optimal joint policy in scenarios where the coordination is difficult. We report an investigation of existing algorithms for the learning of coordination in cooperative MAS, and suggest a Q-Learning extension for ILs, called Hysteretic Q-Learning. This algorithm does not require any additional communication between robots. Its advantages are showing off and compared to other methods on various applications : bimatrix games, collaborative ball balancing task and pursuit domain

    Reward function and initial values : Better choices for accelerated Goal-directed reinforcement learning.

    No full text
    International audienceAn important issue in Reinforcement Learning (RL) is to accelerate or improve the learning process. In this paper, we study the influence of some RL parameters over the learning speed. Indeed, although RL convergence properties have been widely studied, no precise rules exist to correctly choose the reward function and initial Q-values. Our method helps the choice of these RL parameters within the context of reaching a goal in a minimal time. We develop a theoretical study and also provide experimental justifications for choosing on the one hand the reward function, and on the other hand particular initial Q-values based on a goal bias function

    A study of FMQ heuristic in cooperative multi-agent games.

    No full text
    International audienceThe article focuses on decentralized reinforcement learning (RL) in cooperative multi-agent games, where a team of independent learning agents (ILs) try to coordinate their individual actions to reach an optimal joint action. Within this framework, some algorithms based on Q-learning are proposed in recent works. Especially, we are interested in Distributed Q-learning which finds optimal policies in deterministic games, and in the Frequency Maximum Q value (FMQ) heuristic which is able in partially stochastic matrix games to distinguish if a poor reward received for the same action are due to either miscoordination or to the noisy reward function. Making this distinction is one of the main difficulties to solve stochastic games. Our objective is to find an algorithm able to switch over the updates according to a detection of the cause of noise. In this paper, a modified version of the FMQ heuristic is proposed which achieves this detection and the update adaptation. Moreover, this modified FMQ version is more robust and very easy to set

    Robotic Micromanipulation and Microassembly using Mono-view and Multi-scale visual servoing.

    No full text
    International audienceThis paper investigates sequential robotic micromanipulation and microassembly in order to build 3-D microsystems and devices. A mono-view and multiple scale 2-D visual control scheme is implemented for that purpose. The imaging system used is a photon video microscope endowed with an active zoom enabling to work at multiple scales. It is modelled by a non-linear projective method where the relation between the focal length and the zoom factor is explicitly established. A distributed robotic system (xy system, z system) with a twofingers gripping system is used in conjunction with the imaging system. The results of experiments demonstrate the relevance of the proposed approaches. The tasks were performed with the following accuracy: 1.4 m for the positioning error, and 0.5 for the orientation error

    Choix de la fonction de renforcement et des valeurs initiales pour accélérer les problèmes d'Apprentissage par Renforcement de plus court chemin stochastique.

    No full text
    National audienceUn point important en apprentissage par renforcement (AR) est l'amélioration de la vitesse de convergence du processus d'apprentissage. Nous proposons dans cet article d'étudier l'influence de certains paramètres de l'AR sur la vitesse d'apprentissage. En effet, bien que les propriétés de convergence de l'AR ont été largement étudiées, peu de règles précises existent pour choisir correctement la fonction de renforcement et les valeurs initiales de la table Q. Notre méthode aide au choix de ces paramètres dans le cadre de problèmes de type goal-directed, c'est-à-dire dont l'objectif est d'atteindre un but en un minimum de temps. Nous développons une étude théorique et proposons ensuite des justifications expérimentales pour choisir d'une part la fonction de renforcement et d'autre part des valeurs initiales particulières de la table Q, basées sur une fonction d'influence

    Robust trajectory tracking and visual servoing schemes for MEMS manipulation.

    No full text
    International audienceThis paper focuses on the automation of manipulation and assembly of microcomponents using visual feedback controls. Trajectory planning and tracking methods are proposed in order to avoid occlusions during microparts manipulation and to increase the success rate of pick-and-place manipulation cycles. The methods proposed are validated using a five degree-of-freedom (DOF) microrobotic cell including a 3 DOF mobile platform, a 2 DOF micromanipulator, a gripping system and a top-view imaging system. Promising results on accuracy and repeatability of microballs manipulation tasks are obtained and presented

    A direct visual servoing scheme for automatic nanopositioning.

    Get PDF
    International audienceThis paper demonstrates an accurate nanopositioning scheme based on a direct visual servoing process. This technique uses only the pure image signal (photometric information) to design the visual servoing control law. With respect to traditional visual servoing approaches that use geometric visual features (points, lines ...), the visual features used in the control law is the pixel intensity. The proposed approach has been tested in term of accuracy and robustness in several experimental conditions. The obtained results have demonstrated a good behavior of the control law and very good positioning accuracy. The obtained accuracies are 89 nm, 14 nm, and 0.001 degrees in the x, y and axes of a positioning platform, respectively

    Un algorithme décentralisé d'apprentissage par renforcement multi-agents coopératifs : le Q-Learning Hystérétique.

    No full text
    National audienceNous nous intéressons aux techniques d'apprentissage par renforcement dans les systèmes multi-agents coopératifs. Nous présentons un nouvel algorithme pour agents indépendants qui permet d'apprendre l'action jointe optimale dans des jeux où la coordination est difficile. Nous motivons notre approche par le caractère décentralisé de cet algorithme qui ne nécessite aucune communication entre agents et des tables Q de taille indépendante du nombre d'agents. Des tests concluants sont de plus effectués sur des jeux coopératifs répétés, ainsi que sur un jeu de poursuite

    2-DOF Contactless Distributed Manipulation Using Superposition of Induced Air Flows.

    No full text
    International audienceMany industries require contactless transport and positioning of delicate or clean objects such as silicon wafers, glass sheets, solar cell or flat foodstuffs. The authors have presented a new form of contactless distributed manipulation using induced air flow. Previous works concerned the evaluation of the maximal velocity of transported objects and one degreeof- freedom position control of objects. This paper introduces an analytic model of the velocity field of the induced air flow according to the spatial configuration of vertical air jets. Then two degrees-of-freedom position control is investigated by exploiting the linearity property of the model. Finally the model is validated under closed-loop control and the performances of the position control are evaluated

    A new Aerodynamic traction principle for handling products on an Air Cushion.

    No full text
    International audienceThis paper introduces a new aerodynamic traction principle for handling delicate and clean products, such as silicon wafers, glass sheets or flat foodstuff. The product is carried on a thin air cushion and transported along the system by induced air flows. This induced air flow is the indirect effect of strong vertical air-jets that pull the surrounding fluid. The paper provides a qualitative explanation of the operating principles and a description of the experimental device. Very first experimental results with active control are presented. The maximum velocity and acceleration that can be obtained for the considered device geometry meet the requirements for industrial applications
    • …
    corecore